Annotation Scheme and Gold Standard for Dutch Subjective Adjectives
نویسندگان
چکیده
Many techniques are developed to derive automatically lexical resources for opinion mining. In this paper we present a gold standard for Dutch adjectives developed for the evaluation of these techniques. In the first part of the paper we introduce our annotation guidelines. They are based upon guidelines recently developed for English which annotate subjectivity and polarity at word sense level. In addition to subjectivity and polarity we propose a third annotation category: that of the attitude holder. The identity of the attitude holder is partly implied by the word itself and may provide useful information for opinion mining systems. In the second part of paper we present the criteria adopted for the selection of items which should be included in this gold standard. Our design is aimed at an equal representation of all dimensions of the lexicon , like frequency and polysemy, in order to create a gold standard which can be used not only for benchmarking purposes but also may help to improve in a systematic way, the methods which derive the word lists. Finally we present the results of the annotation task including annotator agreement rates and disagreement analysis.
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملAn Annotation Scheme and Gold Standard for Dutch-English Word Alignment
The importance of sentence-aligned parallel corpora has been widely acknowledged. Reference corpora in which sub-sentential translational correspondences are indicated manually are more labour-intensive to create, and hence less wide-spread. Such manually created reference alignments – also called Gold Standards – have been used in research projects to develop or test automatic word alignment s...
متن کاملPredicative Adjectives: An Unsupervised Criterion to Extract Subjective Adjectives
We examine predicative adjectives as an unsupervised criterion to extract subjective adjectives. We do not only compare this criterion with a weakly supervised extraction method but also with gradable adjectives, i.e. another highly subjective subset of adjectives that can be extracted in an unsupervised fashion. In order to prove the robustness of this extraction method, we will evaluate the e...
متن کاملIdentifying Subjective Adjectives through Web-based Mutual Information
This paper describes a method for ranking a large list of adjectives according to a subjectivity score without resorting to any knowledge-intensive external resources (such as lexical databases, parsers or manual annotation). The method only requires a list of adjectives to be ranked and a small set of “seeds” (manually selected subjective adjectives). The subjectivity score is obtained by comp...
متن کاملAcquisition of Subjective Adjectives with Limited Resources
This paper describes a bootstrapping algorithm for acquiring a lexicon of subjective adjectives which minimizes the recourse to external resources (such as lexical databases, parsers, manual annotation work). The method only employs a corpus tagged with part-ofspeech information and a seed set of subjective adjectives. The list of candidate subjective adjectives is generated incrementally by lo...
متن کامل